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Abstract 

A novel algorithm for actively trading stocks is presented. While traditional expert 
advice and "universal" algorithms (as well as standard technical trading heuristics) attempt 
to predict winners or trends, our approach relies on predictable statistical relations between 
all pairs of stocks in the market. Our empirical results on historical markets provide strong 
evidence that this type of technical trading can "beat the market" and moreover, can beat 
the best stock in the market. In doing so we utilize a new idea for smoothing critical 
parameters in the context of expert learning. 

1. Introduction 

The portfolio selection (PS) problem is a challenging problem for machine learning, online 
algorithms and, of course, computational finance. As is well known (e.g. see Lugosi, 2001) 
sequence prediction under the log loss measure can be viewed as a special case of portfo- 
lio selection, and perhaps more surprisingly, from a certain worst case minimax criterion, 
portfolio selection is not essentially any harder (than prediction) as shown in (Cover & Or- 
dentlich, 1996) (see also Lugosi, 2001, Thm. 20 & 21). But there seems to be a qualitative 
difference between the practical utility of "universal" sequence prediction and "universal" 
portfolio selection. Simply stated, universal sequence prediction algorithms under various 
probabilistic and worst-case models appear to work very well in practice whereas the known 
universal portfolio selection algorithms do not seem to provide any substantial benefit over 
a naive investment strategy (see Section 5). 

A major pragmatic question is whether or not a computer program can consistently 
outperform the market. A closer inspection of the interesting ideas developed in informa- 
tion theory and online learning suggests that a promising approach is to exploit the natural 
volatility in the market and in particular to benefit from simple and rather persistent sta- 
tistical relations between stocks rather than to try to predict stock prices or "winners". 
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We present a non-universal portfolio selection algorithm 1 , which does not try to predict 
winners. The motivation behind our algorithm is the rationale behind constant rebalancing 
algorithms and the worst case study of universal trading introduced by Cover (1991). Not 
only does our proposed algorithm substantially "beat the market" on historical markets, 
it also beats the best stock. So why are we presenting this algorithm and not just simply 
making money? There are, of course some caveats and obstacles to utilizing the algorithm. 
But for large investors the possibility of a goose laying silver (if not golden) eggs is not 
perhaps impossible. 

2. The Portfolio Selection Problem 

Assume a market with m stocks. Let Vt = (vt(l),. . . ,vt(m)) be the daily closing prices 2 
of the m stocks for the t th day, where vt(j) is the price of the jth stock. It is convenient 
to work with relative prices xt(j) = vt(j)/vt-i(j) so that an investment of $d in the jth 
stock just before the t th day yields dxt(j) dollars. We let x* = (a?t(l), . . . , xt(m)) denote the 
market vector of relative prices corresponding to the t day. A portfolio b is an allocation 
of wealth in the stocks, specified by the proportions b = (6(1), . . . , b(m)) of current dollar 
wealth invested in each of the stocks, where b(j) > and Y2 j Kj) = 1- The daily return 
of a portfolio b w.r.t. a market vector x is b • x = ]T\- b(j)x(j) and the (compound) total 
return, retx (bi, . . . , b n ), of a sequence of portfolios bi, . . . , b n w.r.t. a market sequence 
X = xi,... ,x n is n™=i bi • X(. A portfolio selection algorithm A is any deterministic or 
randomized rule for specifying a sequence of portfolios and we let retx {A) denote its total 
return for the market sequence X. 

The simplest strategy is to "buy-and-hold" stocks using some portfolio b. We denote this 
strategy by BAHb and let u-bah denote the uniform buy-and-hold when b = (1/m, . . . , 1/m). 
We say that a portfolio selection algorithm "beats the market" when it outpeforms U-bah 
on a given market sequence although in practice "the market" can be represented by some 
non-uniform bah. 3 Buy-and-hold strategies rely on the tendency of successful markets to 
grow. Much of modern portfolio theory focuses on how to choose a good b for the buy- 
and-hold strategy. The seminal ideas of Markowitz (1959) yield an algorithmic procedure 
for choosing the weights of the portfolio b so as to minimize the variance for any feasible 
expected return. This variance minimization is possible by placing appropriate (larger) 
weights on subsets of sufficiently anti-correlated stocks, an idea which we shall also utilize. 
We denote the optimal in hindsight buy-and-hold strategy (i.e. invest only in the best 
stock) by bah*. 

An alternative approach to the static buy-and-hold is to dynamically change the portfolio 
during the trading period. This approach is often called "active trading" . One example of 
active trading is constant rebalancing; namely, fix a portfolio b and (re)invest your dollars 
each day according to b. We denote this constant rebalancing strategy by CBALb and let 
CBAL* denote the optimal (in hindsight) cbal. A constant rebalancing strategy can often 



Any PS algorithm can be modified to be universal by investing any fixed fraction of the initial wealth in 

a universal algorithm. 

There is nothing special about "daily closing prices" and the problem can be defined with respect to any 

(sub)sequence of the (intra-day) sequence of all price offers which appear in the stock market. 

For example the Dow Jones Industrial Average (DJIA) is calculated as a non uniform average of the 30 

DJIA stocks; see e.g. http://www.dowjones.com/ 
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take advantage of market fluctuations to achieve a return significantly greater than that of 
bah*, cbal* is always at least as good as the best stock bah* and in some real market 
sequences a constant rebalancing strategy will take advantage of market fluctuations and 
significantly outperform the best stock (see e.g. Table 1). For now, consider Cover and 
Gluss's (1986) classic (but contrived) example of a market consisting of cash and one stock 

and the market sequence of price relatives (j^), ( 2 )> (1/2)' (2)' Now consider the CBALb 

with b = (2, 2")- O n each odd day the daily return of CBALb is 9I+ 22 = I an< ^ on eacn even 
day, it is 3/2. The total return over n days is therefore (9/8) n ' 2 , illustrating how a constant 
rebalancing strategy can yield exponential returns in a "no-growth market". Under the 
assumption that the daily market vectors are observations of identically and independently 
distributed (i.i.d) random variables, it is shown in (Cover & Thomas, 1991) that cbal* 
performs at least as good (in the sense of expected total return) as the best online portfolio 
selection algorithm. However, many studies (see e.g. Lo & MacKinlay, 1999) argue that 
stock price sequences do have long term memory and are not i.i.d. 

A non-traditional objective (in computational finance) is to develop online trading 
strategies that are in some sense always guaranteed to perform well. 4 Within a line of 
research pioneered by Cover (Cover & Gluss, 1986; Cover, 1991; Cover & Ordentlich, 1996) 
one attempts to design portfolio selection algorithms that can provably do well (in terms 
of their total return) with respect to some online or offline benchmark algorithms. Two 
natural online benchmark algorithms are the uniform buy and hold U-BAH, and the uniform 
constant rebalancing strategy U-CBAL, which is CBALb with b =(—,...,— ). A natural 
offline benchmark is bah* and a more challenging offline benchmark is cbal*. 

A portfolio selection algorithm A is called universal if for every market sequence X over 
n days, it guarantees a sub exponential ratio (in n) between its return retx(A) and that of 
retx(cBAL*). In particular, Cover and Ordentlich's Universal Portfolios algorithm (Cover, 
1991; Cover & Ordentlich, 1996), denoted here by universal, was proven to be universal; 
more specifically for every market sequence X of m stocks over n days, it guarantees the 
sub exponential (indeed polynomial) ratio 

-1 



retx(CBAL*)/retx(UNIVERSAL) =0 II 2 . (1) 

From a theoretical perspective this is surprising as this performance ratio is bounded by 
a polynomial in n (for fixed m) whereas cbal* is capable of exponential returns. From a 
practical perspective, this bound is not very useful because the empirical returns observed 
for cbal* portfolios is often not exponential in the number of trading days. However, the 
motivation that underlies the potential of cbal algorithms is useful! We follow this motiva- 
tion and develop a new algorithm which we call anticor. By attempting to systematically 
follow the constant rebalancing philosophy, anticor is capable of some extraordinary per- 
formance in the absence of transaction costs, or even with very small transaction costs. 



4. A trading strategy is online if it computes the portfolio for the (t+l) st day using only market information 
for the first t days. This is in contrast to offline algorithms such as U-BAH*, CBAL* and the optimal 
strategy of picking the best stock for each individual day. Such offline algorithms compute a sequence 
of portfolios as a function of the entire market sequence. 
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3. Trying to Learn the Winners 

The most direct approach to expert learning and portfolio selection is a "(reward based) 
weighted average prediction" scheme, which adaptively computes a weighted average of 
experts by gradually increasing (by some multiplicative or additive update rule) the relative 
weights of the more successful experts. In this section we briefly discuss some related 
portfolio selection results along these lines. 

For example, in the context of the PS problem consider the "exponentiated gradient" 
EG(ry) algorithm proposed by (Helmbold et al., 1998). The eg(7?) algorithm computes the 
next portfolio to be 

h t (j) exp {rfXt(J)/(b t ■ x t )} 

bt+i(j) " 



EJLi h t(j) ex P ivMJ)/( h t ■ x t )} ' 

where rj is a "learning rate" parameter, eg was designed to greedily choose the best portfolio 
for yesterday's market x^ while at the same time paying a penalty from moving far from yes- 
terday's portfolio. For a universal bound on EG, Helmbold et al. set r\ = 2x m ; n y / 2(log m)/n 
where x m ; n is a lower bound on any price relative. 5 It is easy to see that as n increases, rj 
decreases to so that we can think of rj as being very small in order to achieve universality. 
When rj = 0, the algorithm EG (77) degenerates to the uniform CBAL (assuming we started 
with a uniform portfolio) which is not a universal algorithm. It is also the case that if each 
day the price relatives for all stocks were identical, then EG (as well as other PS algorithms) 
will converge to the uniform cbal. Combining a small learning rate with a "reasonably 
balanced" market we expect the performance of eg to be similar to that of the uniform 
cbal and this is confirmed by our experiments (see Table l). 6 

Cover's universal algorithms adaptively learn each day's portfolio by increasing the 
weights of successful cbals. The update rule for these universal algorithms is 

f b- rett(cBAL b )ri/i(b) 

t+ j ret t (cBAL h )dfi(h) 

where /i(-) is some prior distribution over portfolios. Thus, the weight of a possible portfolio 
is proportional to its total return rett(b) thus far times its prior. The particular univer- 
sal algorithm we consider in our experiments uses the Dirichlet prior (with parameters 
(5, . . . , |)) (Cover & Ordentlich, 1996). 7 Somewhat surprisingly, as noted in (Cover & Or- 
dentlich, 1996) the algorithm is equivalent to a static weighted average (given by /i(b)) over 
all cbals (see also Borodin & El-Yaniv, 1998, p. 291). This equivalence helps to demystify 
the universality result and also shows that the algorithm can never outperform cbal*. 



Helmbold et al. show how to eliminate the need to know £ m i n and n. While EG can be made universal, 

its performance ratio is only sub-exponential (and not polynomial) in n. 

Following Helmbold et al. we fix rj = 0.01 in our experiments. Additional experiments, for a wide range 

of fixed rj settings, confirm that for our datasets the choice of rj = 0.01 is an optimal or near optimal 

choice. Of course, it is possible to adaptively set 77 throughout the trading period, but that is beyond 

the scope of this paper. 

The papers (Cover, 1991; Cover & Ordentlich, 1996; Blum & Kalai, 1998) consider a simpler version 

of this algorithm where the (Dirichlet) prior is uniform. This algorithm is also universal and achieves 

a ratio 0(n m_1 ). Experimentally (on our datasets) there is a negligible difference between these two 

variants and here we only report on the results of the asymptotically optimal algorithm. 
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A different type of "winner learning" algorithm can be obtained from any sequence 
prediction strategy, as noted in (Borodin, El-Yaniv, & Gogan, 2000). For each stock j, a 
(soft) sequence prediction algorithm provides a probability p(j) that the next symbol will 
be j € {1, . . . ,m}. We view this as a prediction that stock j will have the best relative 
price for the next day and set ht+i(j) = Pj. The paper (Borodin et al., 2000) considers 
predictions made using the prediction component of the well-known Lempel-Ziv (lz) lossless 
compression algorithm (Ziv & Lempel, 1978). This prediction component is nicely described 
in (Langdon, 1983) and in (Feder, 1991). As a prediction algorithm, lz is provably powerful 
in various senses. First it can be shown that it is asymptotically optimal with respect to any 
stationary and ergodic finite order Markov source (Rissanen, 1983; Ziv & Lempel, 1978). 
Moreover, Feder shows that lz is also universal in a worst case sense with respect to the 
(offline) benchmark class of all finite state prediction machines. To summarize, the common 
approach to devising PS algorithms has been to attempt and learn winners using simple or 
more sophisticated winner learning schemes. 

4. The Anticor Algorithm 

We propose a different approach, motivated by a CBAL-inspired "philosophy". How can we 
interpret the success of the uniform cbal on the Cover and Gluss example of Section 2? 
Clearly, the uniform CBAL here is taking advantage of price fluctuation by constantly trans- 
ferring wealth from the high performing stock to the relatively low performing stock. Even 
in a less contrived market, a cbal is capable of large returns. A market model favoring 
the use of a cbal is one in which stock growth rates are stable in the long term and oc- 
casional larger return rates will be followed by smaller rates (and vice versa). This market 
phenomenon is is sometimes called "reversal to the mean" . 

There are many ways that one can interpret and implement algorithms based on the 
philosophy of "reversal to the mean". In particular, any cbal can be viewed as a static 
implementation of this philosophy. We now describe the motivation and basic ingredients in 
our anticor algorithm which adaptively (based on recent empirical statistics) and rather 
aggressively 8 implements "reversal to the mean". 

For a given trading day, consider the most recent past w trading days, where w is some 
integer parameter. The growth rate of any stock i during this window of time is measured 
by the product of relative prices during this window. 9 Motivated by the assumption that we 
have a portfolio of stocks that are all performing similarly in terms of long term growth rates, 
ANTlCOR's first condition for transferring money from stock % to stock j is that the growth 
rate for stock i exceeds that of stock j in this most recent window of time. 10 In addition, 
the anticor algorithm requires some indication that stock j will start to emulate the past 
growth of stock % in the near future. To this end, anticor requires a positive correlation 
between stock i during the second last window and stock j during the last window. The 
relative extent to which we will transfer money from stock i to stock j will depend on 



8. Our ANTICOR algorithm is aggressive (say, compared to CBAL) in the sense that it can transfer all 
assets out of a given stock. Various heuristics can be used to moderate this behavior. 

9. Since we would rather deal with arithmetic instead of geometric means we will use the logarithms of 
relative prices. 

10. Note that here the umderlying model assumption is reversal to the same mean. One can modify the 
algorithm so as to account for different means. 
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the strength of this correlation as well as the strength of the "self anti-correlations" for 
stocks i and j (again in two consecutive windows), anticor is so named because we use 
these correlations and anticorrelations in consecutive windows to indicate the potential for 
anticorrelations of the growth rates for stocks i and j in the near future (with hopefully the 
growth rate of stock j becoming greater than that of stock i). 
Formally, we define 

LXi = log(x t -2w+i), ■ ■ . ,log(x t _ w ) T and LX 2 = log(x t _„, + i), . . . ,log(x t ) T , (2) 

where log(xfc) denotes (log(xfc(l)), . . . ,log(xk{m))). Thus, LXi and LX2 are the two vector 
sequences (equivalently, two w x m matrices) constructed by taking the logarithm over the 
market subsequences corresponding to the time windows [t — 2w + 1, t — w] and [t — w + 1, t], 
respectively. We denote the jth column of LX& by LXfe(j'). Let //& = (/ifc(l), . . . ,//fc(m)), 
be the vectors of averages of columns of LX/%. Similarly, let a^, be the vector of standard 
deviations of columns of LX/%. The cross-correlation matrix (and its normalization) between 
column vectors in LXi and LX2 are defined as 11 

M cov (i,j) = -J_(LXi(z) - Ml (z)f (LX 2 (j) - M2 (j)); 

w — 1 

I otherwise. 

M cor (i,j) € [— 1, 1] measures the correlation between log-relative prices of stock i over the 
first window and stock j over the second window. We note that if (J\{i) (respectively, 
a 2{j)) is zero over some window then the growth rate of stock i during the second last 
window (respectively, stock j during the last window) is constant during this window. For 
sufficiently large windows of time constant growth of any stock i is unlikely. However, in 
this unlikely case we choose not to move money into or out of such a stock z. 12 

For each pair of stocks i and j we compute claims >j , the extent to which we want to shift 

our investment from stock i to stock j. Namely, there is such a claim iff 1x2(1) > /^(j) and 
M cor (i,j) > in which case claim^- = M cor (i, j) + A(i) + A(j) where A(h) = \M cor (h, h)\ if 
M cor (h, h) < 0, else 0. Following our interpretation for the success of a cbal, M cor (i,j) > 
is used to predict that stocks i and j will be correlated in consecutive windows (i.e. the 
current window and the next window based on the evidence for the last two windows) and 
M cor (h,h) < predicts that stock h will be negatively auto-correlated over consecutive 
windows. Finally, b t+ i(i) = h t (i) + Yl j -^ [transfer^ j — transfer^j] where transfer^- = 
ht(i) ■ clairrij^j/^ ■ clairrij^j. A pseudocode summarizing the anticor algorithm appears 

in Figure 1. The pseudocode describes the routine ANTicOR^^X^bi) that receives a 
window size w, the current trading day t, the historical market sequence Xt (giving the 
market vectors corresponding to days 1, . . . ,£) and the current portfolio b^ defined to be 
b( = t— ; — (bt(l)xt(l), . . . , bj(m)x£(m)). The routine is first called with an empty historical 

market sequence and with b^ being the uniform portfolio (over m stocks). The routine 



11. Recall that the correlation coefficient is a normalized covariance with the covariance divided by 
the product of the standard deviations; that is, Cor(X, Y) = Cov(X, Y)/(std(X) * std(Y)) where 
Cov(X, Y) = E[(X - mean{X))(Y - meaniY))]. 

12. Of course, other approaches can be used to accommodate constant or nearly constant growth rate. 
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returns the new portfolio, to which we should rebalance at the start of the (t + l) st trading 
day. 





Algoritm ANTICOR(w,t,X t ,b t ) 




Input: 




1. 


w: Window size 




2. 


i: Index of last trading day 




3. 


Xt — xi, . . . ,x t : Historical market sequence 




4. 


b t : current portfolio (by the end of trading day t) 




Output: bt+i: Next day's portfolio 




1. 


Return the current portfolio bt if £ < 2w. 




2. 


Compute LXi and LX2 as defined in Equation (2), and fii and ^2, the (vector) 
LXi and LX2, respectively. 


averages of 


3. 


Compute Mcor(i,j) as defined in Equation (3). 




4. 


Calculate claims: for 1 < i, j < m, initialize claims ^j = 




5. 


If 11,2(1) > M2O) and M cor (i,j) > then 

(a) claim*-,;; = claim*-,.,- + M cor (i,j); 

(b) if M cor (i,i) < then claim^j = claim*^., — M cor (i,i); 

(c) if M cor (j,j) < then claim*-,./ = clairrii^j - M cor (j,j); 




6. 


Calculate new portfolio: Initialize b t+1 = b'. For 1 < i,j < m 

(a) Let transfer*^ = b' • claim*^.,-/ ^ claims—, . ; 

(b) b^ +1 =b^ +1 - transfer,^; 

(c) b* +1 =b* +1 + transfer,^; 





Figure 1: Algorithm anticor 

Our ANTICOR^, algorithm has one critical parameter, the window size w. In Figure 2 
we depict the total return of ANTicORu, on two historical datasets as a function of the 
window size w = 2, . . . , 30 (detailed descriptions of these datasets appear in Section 5). As 
we might expect, the performance of anticor^ depends significantly on the window size. 
However, for all w, anticor^ beats the uniform market and, moreover, it beats the best 
stock using most window sizes. Of course, in online trading we cannot choose w in hindsight. 
Viewing the anticor^ algorithms as experts, we can try to learn the best expert. But the 
windows, like individual stocks, induce a rather volatile set of experts and standard expert 
combination algorithms (Cesa-Bianchi et al., 1997) tend to fail. 13 

Alternatively, we can adaptively learn and invest in some weighted average of all anticor,, 
algorithms with w less than some maximum W. The simplest case is a uniform invest- 
ment on all the windows; that is, a uniform buy-and-hold investment on the algorithms 
anticor^, w G [2, W], denoted by B AH w (anticor). Figure 3 graphs the total return of 
BAH^y (anticor) as a function of W for all values of 2 < W < 50 for the four datasets we 
consider here. Considering these graphs, our choice of W = 30 was arbitrary but clearly not 



13. This assertion is based on empirical studies we conducted with various 'expert advice' algorithms. 
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NYSE: Anticor vs. window size 



TSX: Anticor vs. window size 




BAH{Anticor ) 
- Anticor 
Best Stock 

■ Market 



) 15 20 25 30 

Window Size (w) 



SP500: Anticor vs. window size 





Window Size (w) 

(b) 

DJIA: Anticor vs. window size 



25 30 



Window Size (w) 

(c) 




Figure 2: anticor^'s total return (per $1 investment) vs. window size 2 < w < 30 for 
(a) NYSE; (b) TSX; (c) SP500; (d) DJIA. The dashed (red) lines represent the 
final return of the best stock and the dash-dotted (blue) lines, the final return 
the (uniform) market. The dotted (green) horizontal lines represent a uniform 
investment on a number of anticor™ applications as later described. 



optimal. Of course, we could try to optimize the parameter W for each particular dataset 
by training the algorithm on historical data before beginning to trade. However, our claim 
is that almost any choice of W will yield returns that beat the best stock (the only exception 
is W = 2 in the DJIA dataset). 

Since we now consider the various algorithms as stocks (whose prices are determined by 
the cumulative returns of the algorithms), we are back to our original portfolio selection 
problem and if the anticor algorithm performs well on stocks it may also perform well on 
algorithms. We thus consider active investment in the various anticor^, algorithms using 
anticor. We again consider all windows w < W. Of course, we can continue to compound 
the algorithm any number of times. Here we compound twice and then use a buy-and-hold 
investment. The resulting algorithm is denoted bahw(anticor(anticor)). One impact of 
this compounding, depicted in Figure 4, is to smooth out the anti-correlations exhibited in 
the stocks. It is evident that after compounding twice the returns become almost completely 
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NYSE: Total Return vs. Max Window 




- BAH w (Anticor) 
Besi Slock 

- Market 



Maximal Window size (W) 

(a) 

SP500: T otal Return vs Max Window 
BAH (Anticor) 




- BAH w (Anticor) 
Best Stock 

- Market 



TSX: Total Return vs Max Window 
BAH(Anticor) 




Maximal Window Size (W) 



(c) 



Maximal Window Size (W) 

(b) 

DJIA: Total Return vs Max Window 




BAhyAnticor) 
Best Stock 
Market 



Maximal Window Size (W) 



(d) 



Figure 3: BAHy^ (anticor) 's total return (per $1 investment) as a function of the maximal 
window W: NYSE (a); TSX (b); SP500 (c); DJIA (d). 



correlated thus diminishing the possibility that additional compounding will substantially 
help. 14 This idea for smoothing critical parameters may be applicable in other learning 
applications. The challenge is to understand the conditions and applications in which the 
process of compounding algorithms will have this smoothing effect. 

5. An Empirical Comparison of the Algorithms 

We present an experimental study of the the anticor algorithm and the three online 
learning algorithms described in Section 3. We focus on BAH30 (anticor), abbreviated by 
anti 1 and bah 30 (anticor(anticor)), abbreviated by anti 2 . Four historical datasets are 
used. The first NYSE dataset, is the one used in (Cover, 1991; Cover & Ordentlich, 1996; 
Helmbold et al., 1998) and (Blum &: Kalai, 1998). This dataset contains 5651 daily prices 
for 36 stocks in the New York Stock Exchange (NYSE) for the twenty two year period July 
3 rd , 1962 to Dec 31 st , 1984. The second TSX dataset consists of 88 stocks from the Toronto 
Stock Exchange (TSX), for the five year period Jan 4<\ 1994 to Dec 31 st , 1998. The third 



14. This smoothing effect also allows for the use of simple prediction algorithms such as "expert advice" 
algorithms (Cesa-Bianchi et al., 1997), which can now better predict a good window size. We have not 
explored this direction. 
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DJIA: Dec 14, 2002 - Jan 14, 2003 

2.8 




5 10 15 20 25 
Days 



5 10 15 20 25 
Days 



5 10 15 20 25 
Days 



Figure 4: Cumulative returns for last month of the DJIA dataset: stocks (left panel); 
ANTICOR™ algorithms trading the stocks (denoted anticor 1 , middle panel); 
ANTICOR^ algorithms trading the ANTICOR algorithms (right panel). 



dataset consists of the 25 stocks from SP500 which (as of Apr. 2003) had the largest market 
capitalization. This set spans 1276 trading days for the period Jan 2 , 1998 to Jan 31 st , 
2003. The fourth dataset consists of the thirty stocks composing the Dow Jones Industrial 
Average (DJIA) for the two year period (507 days) from Jan 14 4h , 2001 to Jan U th , 2003. 15 



Algorithm 


NYSE 


TSX 


SP500 


DJIA 


NYSE" 1 


TSX- 1 


SP500- 1 


DJIA.- 1 


Market (U-BAH) 


14.49 


1.61 


1.34 


0.76 


0.11 


1.67 


0.87 


1.43 


Best Stock 


54.14 


6.27 


3.77 


1.18 


0.32 


37.64 


1.65 


2.77 


CBAL* 


250.59 


6.77 


4.06 


1.23 


2.86 


58.61 


1.91 


2.97 


U-CBAL 


27.07 


1.59 


1.64 


0.81 


0.22 


1.18 


1.09 


1.53 


ANTI 1 


17,059,811.56 


26.77 


5.56 


1.59 


246.22 


7.12 


6.61 


3.67 


ANTI 2 


238,820,058.10 


39.07 


5.88 


2.28 


1383.78 


7.27 


9.69 


4.60 


LZ 


79.78 


1.32 


1.67 


0.89 


5.41 


4.80 


1.20 


1.83 


EG 


27.08 


1.59 


1.64 


0.81 


0.22 


1.19 


1.09 


1.53 


UNIVERSAL 


26.99 


1.59 


1.62 


0.80 


0.22 


1.19 


1.07 


1.53 



Table 1: Monetary returns in dollars (per $1 investment) of various algorithms for four 
different datasets and their reversed versions. The winner and runner-up for each 
market appear in boldface. All figures are truncated to two decimals. 



These four datasets are quite different in nature (the market returns for these datasets 
appear in the first row of Table 1). While every stock in the NYSE increased in value, 32 
of the 88 stocks in the TSX lost money, 7 of the 25 stocks in the SP500 lost money and 



15. The four datasets, including their sources and individual stock compositions can be downloaded from 
http://www.cs.technion.ac.il/~rani/portfolios. 
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25 of the 30 stocks in the "negative market" DJIA lost money. With the exception of the 
TSX, these data sets include only highly liquid stocks with large market capitalizations. In 
order to maximize the utility of these datasets and yet present rather different markets, we 
also ran each market in reverse. This is simply done by reversing the order and inverting 
the relative prices. The reverse datasets are denoted by a '-1' superscript. Some of the 
reverse markets are particularly challenging. For example, all of the NYSE -1 stocks are 
going down. Note that the forward and reverse markets (i.e. u-bah) for the TSX are both 
increasing but that the TSX -1 is also a challenging market since so many stocks (56 of 88) 
are declining. 

Table 1 reports on the total returns of the various algorithms for all eight datasets. We 
see that prediction algorithms such as lz can do quite well and the more aggressive anti 1 
and anti 2 have excellent and sometimes fantastic returns. Note that these active strategies 
beat the best stock and even cbal* in all markets with the exception of the TSX -1 in which 
case they still significantly outperform the market. The reader may well be distrustful of 
what appears to be such unbelievable returns for anti 1 and anti 2 especially when applied 
to the NYSE dataset. However, recall that the NYSE dataset consists of n = 5651 trading 
days and the y such that y n = the total NYSE return is approximately 1.0029511 for 
anti 1 (respectively, 1.0074539 for anti 2 ); that is, the average daily increase is less than 
.3% (respectively, .75%). We observe that learning algorithms such as universal and eg 
have no substantial advantage over U-CBAL. Some previous expositions of these algorithms 
highlighted particular combinations of stocks where the returns significantly outperformed 
the best stock. But the same can be said for U-CBAL. 



DJIA: Cumulative Total Returns 




Market 



Jan03 



Figure 5: DJIA: Cumulative returns of of anti 1 , anti 2 , the best stock and a uniform bah 
(the "market"). 



The total returns of anti 1 and anti 2 presented in Table 1 are impressive but are far 
from telling a complete story. Consider the graphs in figure 6. While both anti 1 and anti 2 
perform well with respect to the uniform market and the best stock throughout most of the 
investment period, there are some periods where the cumulative return of these strategies 
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decrease. This (not surprising) behavior indicates that there is a certain degree of risk in 
using these investment algorithms. 

In finance the standard risk measure is the standard deviation of the return. In Table 2 
we provide annualized returns and risks as well as risk- adjusted returns for all markets 
and algorithms considered here. 16 For example, the annualized return of the best stock in 
the DJIA set is 8.6%, its annualized risk (standard deviation) is 42% and its annualized 
risk-adjusted return (Sharpe ratio) is 11%. 



Algorithm 


NYSE 


TSX 


SP500 


DJIA 


NYSE" 1 


TSX" 1 


SP500- 1 


DJIA" 1 


Market 
(U-BAH) 


12 ± 14% 
58% 


10 ± 12% 
46% 


5 ± 24% 
8% 


-12 ±24% 
-67% 


-9 ± 15% 

-86% 


10 ± 22% 
29% 


-2 ± 22% 
-28% 


19 ± 25% 
61% 


Best Stock 


19 ± 24% 
63% 


44 ± 55% 
73% 


30 ± 51% 
50% 


8 ± 42% 
11% 


-4 ± 21% 
-41% 


106 ± 104% 
98% 


10 ± 32% 
20% 


65 ± 114% 
54% 


CBAL* 


27 ± 30% 

78% 


46 ± 40% 
106% 


31 ± 42% 
65% 


11 ±26% 
27% 


4 ± 40% 
1% 


125 ± 78% 
156% 


13 ± 27% 
35% 


71 ± 76% 
88% 


U-CBAL 


15 ± 13% 

88% 


9 ± 13% 

44% 


10 ± 22% 
28% 


-9 ± 25% 

-54% 


-6 ± 13% 

-77% 


3 ± 13% 
-3% 


1 ± 21% 
-9% 


23 ± 25% 

77% 


ANTI 1 


110 ± 28% 
367% 


93 ± 45% 
196% 


40 ± 37% 
95% 


26 ± 35% 
62% 


27 ± 27% 
86% 


48 ± 41% 
107% 


45 ± 32% 
126% 


90 ± 31% 
277% 


ANTI a 


136 ± 35% 
370% 


108 ± 60% 
172% 


41 ± 44% 
86% 


50 ± 39% 
119% 


38 ± 33% 
101% 


48 ± 46% 
96% 


56 ± 36% 
143% 


113 ± 35% 
304% 


LZ 


21 ± 23% 
76% 


5 ± 25% 
6% 


10 ± 25% 
25% 


-5 ± 28% 
-33% 


7 ±21% 
17 


36 ± 27% 
117% 


3 ± 26% 

-0.8% 


35 ± 27% 
112% 


EG 


15 ± 13% 

88% 


9 ± 13% 

44% 


10 ± 22% 
28% 


-9 ± 25% 

-54% 


-6 ± 13% 

-77% 


3 ± 13% 
-2% 


1 ± 22% 
-9% 


23 ± 25% 

77% 


UNIVERSAL 


15 ± 13% 

87% 


9 ± 13% 

44% 


10 ± 22% 
27% 


-9 ± 25% 
-55% 


-6 ± 13% 

-77% 


3 ± 13% 
-2% 


1 ± 22% 
-11% 


23 ± 25% 
76% 



Table 2: Annualized returns and respective annualized volatilities as well as annualized risk- 
adjusted returns (Sharpe Ratio) of the various algorithms over three datasets and 
their reversed versions. The winner and runner-up Sharpe Ratio for each market 
appear in boldface. All figures are truncated to two decimals. 



6. On Commissions, Trading Friction and Other Caveats 

When handling a portfolio of m stocks our algorithm may perform up to m transactions 
per day. A major concern is therefore the commissions it will incur. Within the propor- 
tional commission model (see e.g. Blum & Kalai, 1998; Borodin & El-Yaniv, 1998, Section 
14.5.4) there exists a fraction 7 € (0, 1) such that an investor pays at a rate of 7/2 for 
each buy and for each sell. Therefore, the return of a sequence t>i , . . . , b n of portfolios 



with respect to a market sequence xi, . . . , x„ is JJ t ( h t ■ x t (l 



EilMi) 



b t(j)l))> where 



16. The annualized return is estimated using the geometric mean of the individual daily returns and the risk 
is the standard deviation of these daily returns multiplied by V252 where 252 is the assumed standard 
number of trading days per year. These calculations are standard. The (annualized) Sharpe ratio 
(Sharpe, 1975) is the ratio of annualized return minus the risk-free return (taken to be 4%) divided by 
the (annualized) standard deviation. 
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-(bf(l)xt(l), . . . ,bi(m)xf(m)). 17 Our investment algorithm in its simplest form 
can tolerate very small proportional commission rates and still beat the best stock. The 
graphs in Figure 6 depict the total returns of bar~3o(anticor) with proportional commis- 
sion factor 7 = 0.1%, 0.2%, . . . , 1%. The strategy can withstand small commission factors. 
For example, with 7 = 0.1% the algorithm still beat the best stock in all four markets we 
consider (and it beats the market with 7 < 0.4%). Moreover it still clearly beats the market 



bfX t 



whenever 7 < 0.^ 
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Figure 6: Total returns of bah3o(anticor) with proportional commissions 7 
0.1%, 0.2%,...,!%. 



However, some current online brokers charge very small proportional commissions, per- 
haps in addition to a small flat commission rate for all trades. 18 This means that a large 
investor can scale up the investment and suffer only a small proportional transaction rate. 

An additional caveat is our assumption that all trades could be implemented using the 
closing price. While in principle there is nothing special about the closing price (i.e. our 
algorithms can trade at any time during the trading day) practical consideration related 
to dataset gathering and availability dictated the use of these prices. 19 Our algorithms 



17. We note that Blum and Kalai (1998) showed that the performance guarantee of UNIVERSAL still holds 
(and gracefully degrades) in the case of proportional commissions. 

18. For example, on its USA site, E*TRADE (https : //us . etrade . com) offers a flat fee of $10 for any trade 
up to 5000 shares and then $.01/share thereafter. 

19. Specifically, historical closing prices are in the public domain and allow for experimental reproducibility. 
Historical intraday trading quotes can also be gathered but such data is usually protected and can be 
costly to obtain. 
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assume that all portfolio adjustments are implemented using the quoted prices they receive 
as inputs. This means that all transactions are implemented simultaneously using the 
quoted prices. With current online brokers a computerized system can issue all transaction 
orders almost instantly but there is no guarantee that they will be all implemented instantly. 
This trading "friction" will necessarily generate discrepancies between the input prices and 
implementation prices. 

A related problem that one must face when actually trading is the difference between 
bid and ask prices. These bid-ask spreads (and the availability of stocks for both buying and 
selling) are functions of stock liquidity and are typically small for large market capitalization 
stocks. We consider here only very large market cap stocks. 

Any report of abnormal returns using historical markets should be suspected of "data 
snooping". In particular, all of our historical data sets are conditioned on the fact that 
all stocks were traded every day and there were no bankrupcies or stocks that became 
virtually worthless in any of these data sets. Furthermore, when a dataset is excessively 
mined by testing many strategies there is a substantial chance that one of the strategies 
will be successful by simple over-fitting. Another data snooping hazard is stock selection. 
Our anticor algorithms were fully developed using only the NYSE and TSX datasets. 
The DJIA and SP500 sets were obtained (from public domain sources) after the algorithms 
were fixed. Finally, our algorithm has one parameter (the maximal window size W). Our 
experiments clearly indicate that the algorithm's performance is robust with respect to W 
(see, for example, Figure 4). 

7. Concluding Remarks 

Traditional work in financial economics tend to focus on the understanding of stock price 
determination. The main question there is: Can we predict the stock market? Judging by 
the extensive but inconclusive work done in financial forecasting, perhaps this is not the most 
beneficial question to ask. Rather, can a computer program consistently outperform the 
market? Besides practicality, it is clear that any successful portfolio selection algorithm is in 
itself a mathematical model that can provide some new intuition on stock price formation. 
For example, in our case, the algorithms suggest that some stock price fluctuations are 
sufficiently "periodic" and anti-correlated. 

A number of well-respected works report on statistically robust "abnormal" returns 
for simple "technical analysis" heuristics, which slightly beat the market. For example, 
the landmark study of Brock, Lakonishok, and LeBaron (1992) apply 26 simple trading 
heuristics to the DJIA index from 1897 to 1986 and provide strong support for technical 
analysis heuristics. While consistently beating the market is considered a significant (if not 
impossible) challenge, our approach to portfolio selection indicates that beating the best 
stock is an achievable goal. While we have mainly focused on an idealized "frictionless 
setting", we believe that even in such a frictionless setting (which seems like a reasonable 
starting point) no such results have been previously claimed in the literature. 

The results presented here raise various interesting questions. Since simple statistical 
relations such as correlation give rise to such outstanding returns it is plausible that various 
other, perhaps more sophisticated machine learning techniques, can give rise to better 
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portfolio selection algorithms capable of larger returns and tolerating larger commissions 
fees. 

On the theoretical side, what is missing at this point of time is an analytical model which 
better explains why our active trading strategies are so successful. In this regard, we are 
investigating various "statistical adversary" models along the lines suggested by Raghavan 
(1992) and Chou et al. (1995). Namely, we would like to show that an algorithm performs 
well (relative to some benchmark) for any market sequence that satisfies certain constraints 
on its empirical statistics. 

One final caveat needs to be mentioned. Namely, the entire theory of portfolio selection 
algorithms assumes that any one portfolio selection algorithm has no impact on the market! 
But just like any goose laying golden eggs, widespread use will soon lead to the end of the 
goose. In our case, the market will quickly react to any method which does consistently 
and substantially beat the market. 
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